Search CORE

817 research outputs found

DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection

Author: A Kuzniar
C Vogel
CE Storm
CE Storm
CM Zmasek
EV Kriventseva
EW Sayers
F Delsuc
G Ostlund
M Ashburner
M Bashton
M Levitt
M Pellegrini
M Remm
R Jothi
RD Finn
RD Finn
RL Tatusov
RT van der Heijden
Timothy H Wu
Ting-wen Chen
TJ Hubbard
Wailap V Ng
Wen-chang Lin
WM Fitch
WM Fitch
Z Fu
Z Fu
Publication venue: BioMed Central
Publication date: 01/10/2010
Field of study

Abstract Background Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired. Results An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases. Conclusions DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from <url>http://140.109.42.19:16080/dodo_web/home.htm</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Partial Homology Relations - Satisfiability in terms of Di-Cographs

Author: A Brandstädt
AM Altenhoff
AM Altenhoff
AM Altenhoff
C Crespelle
C Dessimoz
DG Corneil
DG Corneil
F Chen
F Gurski
G Östlund
J Engelfriet
J Sukumaran
JG Lawrence
K Hartmann
K Trachana
M Hellmuth
M Hellmuth
M Hellmuth
M Hellmuth
M Lafond
M Lafond
M Lafond
M Lechner
M Lechner
M Ravenhall
R Dondi
RL Tatusov
RM McConnell
S Böcker
WM Fitch
Y Gao
Y Liu
Publication venue
Publication date: 03/05/2018
Field of study

Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-labeled gene tree, i.e., they can simultaneously co-exist in an evolutionary history of the underlying genes. Every gene tree is equivalently interpreted as a so-called cotree that entirely encodes the structure of a di-cograph. Thus, satisfiable homology relations must necessarily form a di-cograph. The inferred homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes, it might be known with a high degree of certainty that they are not orthologs (resp.\ paralogs, xenologs) which yields forbidden pairs of genes. Motivated by this observation, we characterize (partial) satisfiable homology relations with or without forbidden gene pairs, provide a quadratic-time algorithm for their recognition and for the computation of a cotree that explains the given relations

arXiv.org e-Print Archive

Crossref

University of Southern Denmark Research Output

Automatically extracting functionally equivalent proteins from SwissProt

Author: A Amores
A Meyer
A Wagner
AA Akindahunsi
Andrew CR Martin
CH Wu
E Kretschmann
EJ Stellwag
EV Koonin
F Chen
GX Yu
II Artamonova
JM Hurst
KP O'Brien
LB Koski
Lisa EM McMillan
MC Lill
MY Galperin
RA Notebaart
RL Tatusov
RL Tatusov
S Shibata
SB Rice
SF Altschul
T Hulsen
T Hulsen
V Kunin
V van Noort
WM Fitch
Y Lee
Y Yaron
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Enlighten

Side effects: substantial non-neutral evolution flanking regulatory sites

Author: CA Semple
Colin A. Semple
CP Ponting
CP Ponting
D Graur
E Kenigsberg
E Kenigsberg
Gregory S. Barsh
I Dunham
James G. D. Prendergast
JG Prendergast
MM Hoffman
S Neph
S Roy
WM Fitch
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/05/2013
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Canalization of the evolutionary trajectory of the human influenza virus

Author: A Kucharski
A Monto
A Park
A Rambaut
Andrew Rambaut
B Adams
B Finkelman
C Viboud
CA Russell
D Weinreich
DJ Smith
E Goldstein
E O'Dea
E Volz
F Carrat
F Tria
G Chowell
G Hirst
IG Barr
J Gog
J Lin
J O'Brien
J Truscott
J Voeten
K Koelle
K Koelle
K Koelle
K Koelle
M Recker
Mercedes Pascual
MI Nelson
NM Ferguson
R Bush
S Bhatt
S Gould
S Kryazhimskiy
S Park
T Bedford
T Bedford
Trevor Bedford
V Gupta
WM Fitch
YI Wolf
Z Yang
Publication venue
Publication date: 19/11/2011
Field of study

Since its emergence in 1968, influenza A (H3N2) has evolved extensively in genotype and antigenic phenotype. Antigenic evolution occurs in the context of a two-dimensional 'antigenic map', while genetic evolution shows a characteristic ladder-like genealogical tree. Here, we use a large-scale individual-based model to show that evolution in a Euclidean antigenic space provides a remarkable correspondence between model behavior and the epidemiological, antigenic, genealogical and geographic patterns observed in influenza virus. We find that evolution away from existing human immunity results in rapid population turnover in the influenza virus and that this population turnover occurs primarily along a single antigenic axis. Thus, selective dynamics induce a canalized evolutionary trajectory, in which the evolutionary fate of the influenza population is surprisingly repeatable and hence, in theory, predictable.Comment: 29 pages, 5 figures, 10 supporting figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Deep Blue Documents at the University of Michigan

The genome and transcriptome of Trichormus sp NMC-1: insights into adaptation to extreme environments on the Qinghai-Tibet Plateau

Author: A Stamatakis
A Zorina
AL Delcher
B Langmead
BA Methé
C Xie
DA Los
DJ Wright
EP Balskus
G Blanc
G Norsang
HÄ Suh
J Qi
J Qi
J Zhang
JF Hess
JI Carreto
JM Shick
JP Zehr
K Mavromatis
KS Siddiqui
L Li
L R
L Ran
M Borodovsky
M Dassanayake
M Li
M Suyama
N Myers
P Pereira
P Puigbò
P Rajaniemi
PH Sudmant
PM Shih
Q Qiu
Q Tang
R Cavicchioli
RC Edgar
RL Tatusov
S Richter
SP Singh
SP Singh
SP Singh
T De Bie
T Kaneko
T Kogej
T Shi
U Consortium
U Nübel
WM Fitch
Z Xu
Z Yang
Z Yang
ZA Cheviron
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/07/2016
Field of study

The Qinghai-Tibet Plateau (QTP) has the highest biodiversity for an extreme environment worldwide, and provides an ideal natural laboratory to study adaptive evolution. In this study, we generated a draft genome sequence of cyanobacteria Trichormus sp. NMC-1 in the QTP and performed whole transcriptome sequencing under low temperature to investigate the genetic mechanism by which T. sp. NMC-1 adapted to the specific environment. Its genome sequence was 5.9 Mb with a G+C content of 39.2% and encompassed a total of 5362 CDS. A phylogenomic tree indicated that this strain belongs to the Trichormus and Anabaena cluster. Genome comparison between T. sp. NMC-1 and six relatives showed that functionally unknown genes occupied a much higher proportion (28.12%) of the T. sp. NMC-1 genome. In addition, functions of specific, significant positively selected, expanded orthogroups, and differentially expressed genes involved in signal transduction, cell wall/membrane biogenesis, secondary metabolite biosynthesis, and energy production and conversion were analyzed to elucidate specific adaptation traits. Further analyses showed that the CheY-like genes, extracellular polysaccharide and mycosporine-like amino acids might play major roles in adaptation to harsh environments. Our findings indicate that sophisticated genetic mechanisms are involved in cyanobacterial adaptation to the extreme environment of the QTP

Crossref

Institute of Hydrobiology, Chinese Academy Of Sciences

University of Bedfordshire Repository

Maximum Parsimony on Phylogenetic networks

Author: A Schrijver
AG Kluge
AWF Edwards
BM Moret
CT Nguyen
D Huson
D Sankoff
D Sankoff
G Jin
G Jin
J Hein
J Hein
JS Farris
JS Farris
L Foulds
L Nakhleh
L Nakhleh
Lavanya Kannan
W Day
Ward C Wheeler
WM Fitch
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past. Results In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores. Conclusion The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A common root for coevolution and substitution rate variability in protein sequence evolution

Author: Alexey Drozdetskiy
BE Suzek
C. Kosiol
CO Wilke
D de Juan
D Penny
David T. Jones
DT Jones
EA Gaucher
EA Gaucher
F Rizzato
F. Morcos
HM Berman
JA Grahnen
JAG De Visser
Jesse D. Bloom
L Bromham
L Burger
M Arenas
M Ekeberg
M Heinig
M Łuksza
M. Weigt
MN Price
N Galtier
N Halabi
P Bak
P Rice
Premal Shah
RB Squires
RD Finn
S Cocco
S Huang
S. Q. Le
SC Lovell
SY Ho
TF Smith
The UniProt Consortium
TN Starr
U Bastolla
WM Fitch
WM Fitch
X Gu
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We introduce a simple model that describes the average occurrence of point variations in a generic protein sequence. This model is based on the idea that mutations are more likely to be fixed at sites in contact with others that have mutated in the recent past. Therefore, we extend the usual assumptions made in protein coevolution by introducing a time dumping on the effect of a substitution on its surrounding and makes correlated substitutions happen in avalanches localized in space and time. The model correctly predicts the average correlation of substitutions as a function of their distance along the sequence. At the same time, it predicts an among-site distribution of the number of substitutions per site highly compatible with a negative binomial, consistently with experimental data. The promising outcomes achieved with this model encourage the application of the same ideas in the field of pairwise and multiple sequence alignment

Crossref

Sissa Digital Library

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes

Author: A Alexeyenko
A Hadgu
Aaron J. Mackey
AJ Enright
AJ Enright
CE Storm
CE Storm
Cecile Fairhead
CG Elsik
CM Zmasek
CM Zmasek
David S. Roos
DP Wall
EL Sonnhammer
EV Koonin
EV Koonin
F Chen
Feng Chen
H Hegyi
J Gouzy
J Magidson
JD Thompson
Jeroen K. Vermunt
JK Vermunt
JK Vermunt
KP O'Brien
L Li
LB Koski
M Remm
RF Doolittle
RL Tatusov
RL Tatusov
RL Tatusov
RL Tatusov
S Bandyopadhyay
S Henikoff
S Van Dongen
SF Altschul
SL Hui
T Hulsen
TF Deluca
WM Fitch
WM Fitch
Y Lee
Y Qu
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Tilburg University Repository

Genome landscapes and bacteriophage codon usage

Author: A Eyre-Walker
A Grosberg
Aviv Regev
C Roessner
D Lubensky
David R. Nelson
E Haggard-Ljungquist
E Zuckerkandl
F Angly
F Sanger
G Bernardi
G Hatfull
G Jenkins
G Kudla
G Modiano
Grzegorz R. Kudla
H Akashi
H Brussow
H Ochman
J Drake
J Lawrence
J Lawrence
J Lobry
J Plotkin
J Weeks
JB Plotkin
JG Lawrence
Joshua B. Plotkin
JR Powell
Julius B. Lucks
K Dittmar
K Sahu
K Sahu
K Sau
K Sau
KB Zeldowich
M Francino
M Pedulla
M Sorensen
N Ashcroft
N Galtier
P Ingvarsson
P Sharp
P Sharp
R Debry
R Edwards
R Fisher
R Hendrix
R Hendrix
R Hendrix
R Hendrix
R Inman
R Juhala
S Altschul
S Gregory
S Karlin
T Ikemura
T Ikemura
T Ikemura
T Kunisawa
WM Fitch
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/08/2007
Field of study

Across all kingdoms of biological life, protein-coding genes exhibit unequal usage of synonmous codons. Although alternative theories abound, translational selection has been accepted as an important mechanism that shapes the patterns of codon usage in prokaryotes and simple eukaryotes. Here we analyze patterns of codon usage across 74 diverse bacteriophages that infect E. coli, P. aeruginosa and L. lactis as their primary host. We introduce the concept of a `genome landscape,' which helps reveal non-trivial, long-range patterns in codon usage across a genome. We develop a series of randomization tests that allow us to interrogate the significance of one aspect of codon usage, such a GC content, while controlling for another aspect, such as adaptation to host-preferred codons. We find that 33 phage genomes exhibit highly non-random patterns in their GC3-content, use of host-preferred codons, or both. We show that the head and tail proteins of these phages exhibit significant bias towards host-preferred codons, relative to the non-structural phage proteins. Our results support the hypothesis of translational selection on viral genes for host-preferred codons, over a broad range of bacteriophages.Comment: 9 Color Figures, 5 Tables, 53 Reference

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central